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Abstract — The normal operation of power system relies on 
accurate state estimation tliat faithfully reflects the physical 
aspects of the electrical power grids. However, recent research 
shows that carefully synthesized false-data injection attacks can 
bypass the security system and introduce arbitrary errors to state 
estimates. In this paper, we use graphical methods to study de- 
fending mechanisms against false-data injection attacks on power 
system state estimation. By securing carefully selected meter 
measurements, no false data injection attack can be launched 
to compromise any set of state estimates. We characterize the 
optimal protection problem, which protects the state estimates 
with minimum number of measurements, as a variant Steiner 
tree problem in a graph. Based on the graphical characterization, 
we propose both exact and reduced-complexity approximation 
algorithms. In particular, we show that the proposed tree- 
pruning based approximation algorithm significantly reduces 
computational complexity, while yielding negligible performance 
degradation compared with the optimal algorithms. The advan- 
tageous performance of the proposed defending mechanisms is 
verified in IEEE standard power system testcases. 

Index Tenns — False-data injection attack, power system state 
estimation, smart grid security, graph algorithms. 



I. Introduction 
A. Motivations and summery of contributions 

THE current power systems are continuously monitored 
and controlled by EMS/SCADA (Energy Management 
System and Supervisory Control and Data Acquisition) sys- 
tems in order to maintain the operating conditions in a normal 
and secure state 11|. In particular, the SCADA host at the 
control center processes the received meter measurements 
using a state estimator, which filters the incorrect data and 
derives the optimal estimate of the system states. These state 
estimates will then be passed on to all the EMS application 
functions, such as optimal power flow, etc, to control the 
physical aspects of the electrical power grids. 

However, the integrity of state estimation is under mount- 
ing threat as we gradually transform the current electricity 
infrastructures to future smart power grids. Smart power grids 
are more open to and physically accessible by the outside net- 
works, such as office local area networks and smart meters that 
allow two-way communications between energy consumers 
and suppliers. With these entry points introduced to the power 
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system, potential complex and collaborating malicious attacks 
are brought in as well. Liu et al. [2J showed that a new 
false-data injection attack could circumvent bad data detection 
(BDD) in today's SCADA system and introduce arbitrary 
errors to state estimates without being detected. Such an attack 
is referred to as an undetectable false-data injection attack. A 
recent experiment in |3 1 demonstrates that the attack can cause 
a state-of-the-art EMS/SCADA state estimator to produce a 
bias of more than 50% of the nominal value without triggering 
the BDD alarm. Biased estimates could directly lead to serious 
social and economical consequences. For instance, |4| and 
|5| showed that attackers equipped with data injection can 
manipulate the electricity price in power market. Worse still, 
ID warned that the attack can even cause regional blackout. 

A common approach to mitigate false-data injection attack 
is to secure meter measurements by, for example, guards, 
video monitoring, or tamper-proof communication systems, 
to evade malicious injections Q-ID. Recent studies have 
proposed a number of methods to select meter measurements 
for protection. For instance, [7 1 proved that it is necessary 
and sufficient to protect a set of basic measurements so that 
no undetectable false-data injection attack can be launched. 
However, the protection scheme in [7] is costly in that the 
size of a set of basic measurements is the same as the number 
of unknown state variables in the state estimation problem, 
which could be up to several hundred in a large-scale power 
system. On the other hand, despite the vast size of unknown 
state estimates, only part of the them are indeed considered to 
be critical to maintain the normal operations of power systems, 
such as those critical state variables used to maintain voltage 
stability and the synchronism among generators llTOl . ifTTI . 
Therefore, it is valuable to devise a method that gives priority 
to defending those state estimates that serve our best interests. 
In this paper, we focus on using graphical methods to derive 
efficient strategies that defend any set of critical state estimates 
with minimum number of secure measurements. Our detailed 
contributions are listed as follows, 

• We derive conditions to select a set of meter measure- 
ments, so that no undetectable attack can be launched to 
compromise the critical state estimates if the selected me- 
ters are secured. The conditions are particularly useful in 
formulating the optimal protection problem that defends 
the critical states with a minimum cost. 
« We characterize the optimal protection problem as a 
variant Steiner tree problem in a graph. Then, two exact 
solution methods are proposed, including a Steiner vertex 
enumeration algorithm and a mixed integer linear pro- 
gramming (MILP) formulation derived from a network 



flow model. In particular, the proposed MILP formulation 
reduces the computational complexity by exploiting the 
graphical structure of the optimal solution. 
• To tackle the intractability of the problem, we also pro- 
pose a polynomial-time tree-pruning heuristic (TPH) al- 
gorithm. With a proper parameter, simulation results show 
that it yields close-to-optimal solution, while significantly 
reducing the computational complexity. For instance, the 
TPH solves a problem of a 300-bus testcase in seconds, 
which may take days by the MILP formulation. 

B. Related works 

State estimate protection is closely related to the concept of 
power network observability. The conventional power network 
observability analysis studies whether a unique estimate of 
all unknown state variables can be determined from the 
measurements HI. From the attacker's perspective, [2] proved 
that an undetectable attack can be formulated if removing the 
measurements it compromises will make the power system 
unobservable. Conversely, |7 1 showed that no undetectable at- 
tack can be formulated if the power system is observable from 
the protected meter measurements. In this paper, we extend 
the conventional wisdom of power network observability to a 
generalized state estimate observability to study the protection 
mechanisms for any set of critical state estimates. 

Graphical method is commonly used for power system 
observability analysis. The early work by Krumpholz et al. 
lfT2l stated that a power system is observable if and only if it 
contains a spanning tree that satisfies certain measurement-to- 
transmission-line mapping rules. A follow-up work presented 
a max-flow method to find such mapping to examine the 
observability of a power network [13 1. Few recent papers 
also applied graphical methods to study the attack/defending 
mechanisms of false-data injection. For instance, based on 
the results in lfT2l . lfT4l proposed an algorithm to quantify 
the minimum-effort undetectable attack, i.e. the non-trivial 
attack that compromises least number of meters without being 
detected. Besides, ifTSll used a min-cut relaxation method to 
calculate the security indices defined in lfT6l to quantify the 
resistance of meter measurements in the presence of injection 
attack. Similar min-cut approach was also applied in [17] to 
identify the critical points in the measurement set, the loss of 
which would render the power system unobservable. 

The problem of defending a set of critical state estimates 
against undetectable attack was first studied in our earlier work 
ifTS i, where we proposed an arithmetic greedy algorithm which 
finds the minimum set of protected meter measurements by 
gradually expanding the set of secure state estimates. However, 
the computational complexity of the greedy algorithm can be 
prohibitively high in large scale power systems. For instance, 
it may take years to obtain a solution in a 57-bus system. In 
contrast, we study in this paper the optimal protection from a 
graphical perspective. By exploiting the graphical structures of 
the optimal solution, the proposed MILP formulation obtains 
the optimal solution with significantly reduced complexity. In 
addition, we also propose a pmning-based heuristic that yields 
near-optimal solutions in polynomial time. 



The rest of this paper is organized as follows. In Section 
II, we introduce some preliminaries about state estimation and 
false-data injection attack. We characterize the optimal protec- 
tion problem in a graph in Section III and propose efficient 
algorithms in Section IV. Simulation results are presented in 
Section V. Finally, the paper is concluded in Section VI. 

II. Preliminary 

A. DC measurement model and state estimation 

We consider the linearized power network state estimation 
problem in a steady-state power system with n+ 1 buses. The 
states of the power system include the bus voltage phase angles 
and voltage magnitudes. The voltage magnitudes can often be 
directly measured, while the values of phase angles need to 
be obtained from state estimation [19[. In the linearized (DC) 
measurement model, we assume the knowledge of voltage 
magnitudes at all buses and estimate the phase angles based 
on the active power measurements, i.e. the active power flows 
along the power lines and active power injections at buses [ij. 
By choosing an arbitrary bus as the reference with zero phase 
angle, the network state consisting of the n unknown voltage 
phase angles is captured in a vector 9 — {61,62, ■■, On) ■ In 
the DC measurement model, the m received measurements 
z = (zi, Z2, .., z„i) are related to the network states as 
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Here, H is the measurement Jacobian matrix [1]. e ~ 
A/^(0,R) is independent measurement noise with covariance 
R. When H is full column rank, i.e. rank (H) = n, the 
maximum likelihood estimate is given by 
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Since rank (H) < m, i.e. the number of rows in H, at least n 
meters are needed to derive a unique state estimation. Mean- 
while, the other m — n measurements provide the redundancy 
to improve the resistance against random errors. 

Errors could be introduced due to various reasons, such 
as device misconfiguration and malicious attacks. The current 
power systems use BDD mechanism to remove the bad data 
assuming that the errors are random and unstructured. It 
calculates the residual r = z — H0 and compares its ^2-norm 
with a prescribed threshold t. A measurement z is identified 
as a bad data measurement if 

r=|lz-H0|| = ||(I-HP)el|>r. (3) 

Otherwise, z is considered as a normal measurement. 

B. Undetectable attacks and protection model 

Suppose that attackers inject malicious data a = 
(fli, 02, .., Om) into measurements. Then, the received mea- 
surements become 



H0 



(4) 



In general, a is likely to be identified by the BDD if it is 
unstructured. Nevertheless, it is found in [2] that some well- 
structured injections, such as those with a = He, can bypass 
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Fig. 1. A measurement placement for the IEEE 14-bus testcase. 

BDD. Here c = (ci,C2, ..,c„) is a random vector This can 
be verified by calculating the residual in dUl, where 



|z-HPz|| = ||z + a-H(0 + c)|| = ||z-H0| 



(5) 



The same residual is obtained as if no malicious data were 
injected. Therefore, a structured attack a = He will not be 
detected by BDD. In this case, the system operator would 
mistake ^ + c for a valid estimate, and thus an error vector c 
has been introduced without being detected. 

The risks of undetectable attacks can be mitigated if the 
system operator can secure measurements to evade malicious 
injections. Within this context, we assume that the system 
operator's objective is to ensure that no undetectable attack 
can be formulated to compromise a given set of state estimates 
2? C I, where I is the set of all unknown state estimates. That 
is, Ci = for all i G P. This is achieved by securing a set of 
meter measurements P C A^, where A4 is the set of all the 
meters. In other words, attackers are not able to inject false 
data to any protected meter measurement, i.e. a^ = 0, Vi G V. 

From ifTsl . securing a set of meters V would eliminate the 
possibility of undetectable attack to compromise a set of state 
variables V, if and only if 



rank (H{p}_,) = rank (H{7:>}^{x\-d}) + \T^\- 



(6) 



Here, H^-py^ is the submatrix of H including the rows that 
correspond to V and H{p} {lypj is the submatrix of H{p} » 
excluding the columns that correspond to V. Naturally, we are 
interested in minimizing the cost to protect the critical states 
v. For simplicity, we assume a fixed cost, e.g. manpower or 
surveillance installation cost, of securing each meter for the 
time being. This requires solving the following problem 



minimize \V\ 

VCM 



(7) 



subject to rank (H{-p}_,) = rank (H{73}^{x\-d}) + 1^1; 
which is proved to be an NP-hard problem in the next section. 

III. Graphical Characterizations of Optimal 
State Estimate Protection 

Interestingly, we show that (|7]i can be characterized as a 
variant Steiner tree problem in a graph. The results will be used 
in the next section to develop efficient graphical algorithms. 



A. Network observability and state estimate protection 

A power network can be described in an undirected graph, 
where vertices and edges represent buses and transmission 
lines, respectively. We use el and e^ to denote the two 
vertices connected to the edge e^, and Nj to denote the set 
of edges incident to vertex Vj. First, we have the following 
definitions on meter measurements. 

Definition 1: The flow meter on transmission line e^ 
measures the edge Ci, and the two vertices e\ and e^- ' . The 
injection meter at bus Vj measures the edge set {e^ | Ci G A/}}, 
and vertex set -^ e^ , e\ | e^ G A/} \ . 

Definition 2: For a set of meters M '^ M, G [M) = 
(V, E) is the measured subnetwork of M, including all the 
vertices V and edges £ measured by 7W. In particular, G (M) 
is referred to as the measured full network. 

A measurement placement of a 14-bus testcase is presented 
in Fig. [T] For instance, the flow meter tq measures edge eio 
and vertices v^ and ug. The injection meter ri2 measures 
edges ei and 62, and vertices vi, V2 and U5. The measured 
subnetwork of ^Vf = {rg, ri2} is a closed network consisting 
of vertices V = {wi, W2, "5, wg} and edges £ = {ei,e2,eio}. 

The conventional power network observability analysis 
studies whether a unique estimate of all unknown state vari- 
ables can be determined |T|. Here, we extend the concept of 
network observability to a generalized state estimate observ- 
ability. With a bit abuse of notation, we use a set of vertices 
V" to denote the corresponding state variables. 

Definition 3: A set of state variables 2? C I is observable 
from a set of meters P C A^, if and only if a unique estimate 
of V can be obtained from the measurements V. Besides, V 
is a basic measurement set of V, if V is observable from V 
and \r\ = \V\. 

Remark 1: The conventional definition of network observ- 
ability is a special case with V = I and V — M. A basic 
measurement set of T) is the minimum measurement set that 
ensures the observability of V. However, not all 2?'s have a 
basic measurement set. 

Definition 4: A measured subnetwork G {V) — {V',£') is 
an observable subnetwork if and only if all the unknown state 
variables S in the subnetwork is observable from V, i.e. 



rank fH 



iV},{S), 



= \S\ 



(8) 



where S — V \R, with R being the reference bus. 

Remark 2: An observable subnetwork G[V) = (V',5') 
is closed in the sense that it consists of the buses V" and 
lines £' measured by V. From dS), P contains at least a basic 
measurement set of V'. Besides, G must include the reference 
bus R, i.e. R G V, since otherwise rank (Hj-pj {5}) < \S\. 

We proceed to establish the equivalence between state 
observability and state estimate protection criterion. 

Tlieorem 1: Protecting a set of meter measurements V can 
defend a set of state estimates V against undetectable attack, 
if and only if V is observable from V. 

Proof: We first prove the if part. When V is observable 
from V, there must exist an observable subnetwork G (V) — 
(V, £) that includes V, i.e. V C V md P C V. From ©, 

we have rank ( Hrpi r^i J = |<S|, where S = V \ R. Then, 



the solution of c to iirp\ »c = is c = (O, Cj^\s) , where 
Cj\ 5 is an arbitrary vector That is, no undetectable attack can 
be formulated to compromise 5 if P is well protected. Since 
DCS and P '^V, this completes the proof of the if part. 

We then show the only if part. That is, there exists an 
undetectable attack to compromise P if 2? is unobservable 
from V. Since random noise is not related to the network 
observability, we neglect e in ([T]i. According to the definition 
of observability, there exists a z-p and at least two different 
estimates of unknown variables, denoted by 6 and 9, satisfy 



[12] Protected meters 



Z-p = H{p}^*0 — H{73}_,, 



(9) 



and Ok / Ok for some k E V when 2? is unobservable from 
V. By letting c = 6 — 9, we have H^p} ,c = and c^ ^ 0. 
Therefore, an undetectable attack a = He can compromise 
state 9k without being detected. ■ 

Remark 3: All the unknown state estimates to be defended, 
i.e. V, are included in an observable subnetwork constructed 
from a set of protected meters. In the following subsection, we 
find that the optimal observable subnetwork has an interesting 
Steiner tree structure. 

B. Graphical equivalence of optimal protection 

The power network observability analysis in {\T\ showed a 
connection between network observability and a spanning tree 
structure. The idea is briefly covered in Proposition 1. 

Proposition 1: The measured fuU network G {M) = (V, £) 
is observable if and only if the graph defined on G contains a 
spanning tree, where each edge of which is mapped to a meter 
according to the following rules, 

1) an edge is mapped to a flow meter placed on it, if any; 

2) an edge without a flow meter is mapped to an injection 
meter that measures it; 

3) different edges are mapped to different meters in M.. 
Proof: See the proof in [12J. ■ 
Proposition 1 states that any basic measurement set of V 

can be mapped to a spanning tree in the measured full graph. 
On the other hand, a measured subnetwork G (V) = {V,£^, 
where 'P C A^, can also be considered as a closed network 
whose observability is only related to the components within 
G (V). Therefore, there also exists a measurement-to-edge 
mapping in an observable subnetwork, specified as following. 

Corollary 1: A measured subnetwork G (P) = (V,£) is 
observable if and only if the graph defined on G [V] contains 
a tree that connects all vertices in V, where each edge of the 
tree is one-to-one mapped to a unique meter in V that takes 
its measurement. 

Proof: The proof follows by replacing M. with V in 
Proposition 1. ■ 

From Remark 3 and Corollary 1, we see that the unknown 
state estimates to be defended are indeed contained in a tree 
constructed from a protected meter measurement set. There- 
fore, we propose the following minimum measured Steiner tree 
(MMST) problem in a graph that is equivalent to the optimal 
state protection problem (|7]i. 

MMST problem: Given the measured full graph G {M) = 
{V,£). To protect a set of state estimates V with a minimum 




Fig. 2. An illustration of MMST from the IEEE 14-bus testcase. 

cost, the MMST problem finds a shortest Steiner tree T* = 
(V* , £* ) (with the minimum number of edges) and a set of 
meters T'* C A^ that satisfy the following conditions. 

1) V* is the set of all vertices measured by V*\ 

2) V <ZV* and R G V*; 

3) each edge in £* is one-to-one mapped to a unique meter 
in V* that takes its measurement. 

Then, the set of meters V* is the optimal solution to (jT). 

We name the problem as a Steiner tree problem, instead of 
spanning tree, because T* in general connects only a subset 
of vertices in the measured full graph. The three conditions 
ensure that all the unknown state estimates in T*, including 
V, are observable from V* . To bring out the intuitions, we 
present an example from Fig. [T] where we assume that V = 
{w8,wi2} and vi is the reference bus. The optimal protected 
meters set V* = {r-i, ra, r4, rgjrg, rio, ri2, ris} is obtained 
from exhaustive search. The corresponding minimum Steiner 
tree T* is plotted in Fig. [2] We see that conditions 1) and 2) 
are clearly satisfied. Condition 3 is satisfied by mapping edges 
62 and ei2 to injection meters ri2 and rig, and the other edges 
in £* to the flow measurements placed on them. 

We show that the MMST problem is NP-hard by consider- 
ing a special case where flow meters are installed at all edges 
of G {M) — (V, £). Then, any Steiner trees that include R and 
T) automatically satisfy the three conditions, i.e. by mapping 
each edge to the corresponding flow meter In this case, the 
MMST problem becomes a standard minimum Steiner tree 
(MST) problem, which finds the shortest subtree of the full 
graph that connects R and all the vertices in T). MST is a 
well-known NP-hard problem. The time complexity of known 
exact algorithms increase exponentially with |I?| or \I\ — |2? 
f20l. Since MST is a special case of the MMST problem, 
the MMST problem is also NP-hard following the reduction 
lemma for computational complexity analysis. A special case 
of the MMST problem with V = I is solved in Qll and 
|13| with time complexity 0(|V||f|). The special case is 
easy because V* = V holds automatically when all the state 
estimates are to be protected. The general MMST problem is 
much harder due to the combinatorial nature of possible V*. 

IV. Graphical Methods for Optimal Protection 

In this section, we first introduce two exact solution methods 
to solve the MMST problem, including the SVE method and a 
MILP formulation. Then, a tree pruning heuristic is proposed 
to obtain an approximate solution in polynomial time. 



A. Steiner vertex enumeration algorithm 

A vertex v in the Steiner tree solution T* = {V*,£*) 
is a terminal if w S V \J R, or a Steiner vertex otherwise. 
The Steiner vertex enumeration (SVE) method enumerates the 
possible Steiner vertices Vo until a minimum observable sub- 
network, including Vo and the terminals, is found. Then, P* 
can be obtained by removing redundant measurements in the 
subnetwork using Gauss-Jordan elimination. A pseudo-code 
of the SVE is presented in Algorithm 1. The time complexity 
of SVE is O (21-^1^1^1), which is computational infeasible in 
large scale power networks, e.g. a 118-bus system. Therefore, 
we mainly use SVE as the performance benchmark to evaluate 
the algorithms proposed in the following subsections. 

Algorithm 1: Steiner vertex enumeration algorithm 

input : 1,T>,M, R 

output: Minimum protected measurements "P* to defend D 

repeat 

Enumerate a set of Steiner vertices Vo C {I \ V}, from 

size |Vol = to |I| - |I?|. Let 5 = © U Vo; 

Find the meters V that measure only the buses in 5 U -R; 



until rank ( H 



iP},{S} 



) = 151 



V = a basic measurement set of S: 



B. Mixed integer linear programming formulation 

In this subsection, we propose a MILP formulation to solve 
the MMST problem, which has much lower complexity than 
SVE by exploiting the optimal solution structure. Consider a 
digraph G — (V,^) constructed by replacing each edge in 
the measured full graph G {A4) — {V,£) with two arcs in 
opposite directions. We set the reference bus as the root and 
allocate one unit of demand to each vertex in V. Commodities 
are sent from the root to the vertices in T) through some arcs. 
Then, the vertices in V are connected to R via the used arcs 
if and only if all the demand is satisfied. When we require 
using the minimum number of arcs to deliver the commodity, 
the used arcs will form a directed tree, referred to as a 
Steiner arborescence. Evidently, the solution to the MMST 
problem can be obtained if we solve the following minimum 
measured Steiner arborescence (MMSA) problem and neglect 
the orientations of the arcs. Without causing confusions, we 
say an arc (i, j) is measured by a meter if the edge [i, j'] in 
G (A^) is measured by the meter 

MMSA problem: Given a digraph G = (V,^), find the 
shortest arborescence T* = (V*,^*) and a set of meters 
V* Q M. that satisfy the following conditions 

1) V* is the set of all vertices measured by V*; 

2) 2?C V* and i? e V*; 

3) each arc in A* is one-to-one mapped to a unique meter 
in "P* that takes its measurement. 

From condition 1), if an arc in T* is mapped to an injection 
meter, all the vertices measured by the injection meter must 
also be included in the arborescence like the terminals, as if 
an extra demand is allocated at these vertices. To distinguish 
from the actual demand at V, we refer to the extra demand 
induced by the use of injection meters as pseudo demand. 



Then, the MMSA problem is to satisfy both the actual and 
pseudo demand using minimum number of arcs. 

For an arc (i, j) G A, let Xij be a binary variable with Xij = 
1 indicating that the arc is included in T* and otherwise, yij 
denotes the total amount of commodity through (i, j). Zy be a 
binary variable with Zij = 1 indicating that the injection meter 
at vertex i is mapped to arc {i,j) or {j,i), and otherwise. 
Then, a MILP formulation of the MMSA problem is 



mm 

X,Y,Z 



s. t. 



11} ^ — ' 



(10a) 



(ij)S.A 
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'i-E{i,j) + Zij + Zji > Xij, y (i,j) G A (10c) 

^ z,j < lv{i), Vi G V (lOd) 

(i,3)SA 

E y^j- E y.fc = '^(i).VjGV\i? (loe) 

(i,j)eA. {j,k)eA 

x,j,z,j G {0,1}, y,, > 0,V(i,j) G A. (lOf) 

Here, w is chosen as a large positive number such that w > 
Z](ij)6A ^ij ^°d w > yij always hold. lE{i,j) and ly(i) are 
two binary indicator functions, where lE{i,j) = 1 if a flow 
meter is available at edge [i,j] and ly(i) = 1 if an injection 
meter is available at Vi. d{j) is the demand at vertex j, where 

Mj)^ l^'^^ij.k)eA^3''+z2[k,j]eeZ^(k,s)eA^ks JG© 



J^-D. 



For j ^ V, d{j) is the total pseudo demand. Otherwise, one 
extra unit of actual demand is counted as well. 

As we can see, there are two terms in (llOa) . each corre- 
sponding to one objective. The first term is to minimize the 
total number of arcs included in the arborescence. The second 
term is is to minimize the number of injection measurements. 
Notice that the first objective is primary, as the second term 
in (llOab is always dominated by the first one due to the 
scaling factor 1/w, which makes the second term always less 
than 1. As such, ( llOal ) is to minimize the total number of 
arcs in the arborescence, and meanwhile eliminating redundant 
injection measurements, such as the case when two injection 
measurements are assigned to the same arc. Constraint (jlObp 
forces arc {i,j) to be included in T* if any commodity flow 
passes through {i,j). Constraint ( llOcl i and ( llOdl i ensure that 
each arc {i,j) included in T* has at least one measurement 
assigned to it and each injection measurement can only be 
assigned to at most one arc. The flow conservative constraint 
dlOel l. together with (|106p . forces the selected arcs to form an 
arborescence rooted at the reference vertex and spanning all 
vertices with positive demand. Once the optimal solution to 
( flOb is obtained, we can restore the optimal solution V* to 
the MMST problem by including: 1) injection measurement 
on bus i if Zij = 1, V(i, j) G A; 2) flow measurement on arc 



(i,j), if Xi 



1 and Zi. 



0, V(i,j) G A That is, the 



arcs m T* not mapped to any injection measurement. 

Extensive simulations show that the MILP formulation al- 
ways obtains the same optimal solution as the SVE algorithm. 
The detailed experiment setup is omitted due to the page limit. 
The MILP significantly reduces the computational complexity 



by exploiting the solution structure. For instance, a problem in 
a 57-bus system that is computationally infeasible by the SVE 
algorithm can now be solved by the MILP within minutes. 
Nonetheless, the computational complexity of the state-of-art 
MILP algorithms, such as branch and bound and cutting-plane 
method, etc, still grows exponentially with the problem size. 
We observe from simulations that it takes excessively long 
time to solve the problem in a 300-bus power system. 

C. Tree pruning heuristic 

To tackle the intractability of the problem, we propose a 
tree-pruning based heuristic (TPH) that finds an approximate 
solution in polynomial time. We refer to a tree T — {y,8^, 
along with a set of measurement P, a feasible measured tree 
if T and P satisfy the conditions of the MMST problem. Our 
observation is that, although it is hard to find a MMST, it 
is relatively "easy" to find a feasible tree that includes all the 
vertices in the graph using the techniques in [T2]. Starting from 
a feasible measured tree that spans all vertices in the measured 
full graph, our TPH method iteratively prunes away redundant 
vertices and updates the feasible tree, until a shortest possible 
tree is obtained. A pseudo-code is provided in Algorithm 2. 
The TPH consists of multiple rounds of pruning operations. 
Here, we explain one round of pruning, which corresponds to 
line 2-8 in the pseudo-code, in the following 4 steps. 

Algorithm 2: Tree pruning heuristic algorithm 

input : G{M) = {V,£), V, R, K 

output: Minimum protected measurements V* to defend T> 

1 initialization: V = V; 

2 repeat 

3 Let W — \V\. Find K basic measurement sets of V, 
denoted by P'', k — 1, .., K. For each P'', construct a 
feasible measured trees T^; 
for each T^ do 

Starting from R to all leaf vertices, find the largest 
prunable subset C's{i) for each Vi. Update T^ = 
Tfe \ {C*(i) U D{C*{i))} until each vertex in Tk is 
either processed or pruned; 
end 

Select the minimum trees T* and update V by letting V = 
the vertices in T*; 
until PV = IT* I; 
V* = the remaining measurements corresponding to T*; 



Step 1: Feasible tree generation. For a set of vertices V 
(initially set to be V), we generate K feasible measured trees, 
where K isa. tunable parameter (lines 3-4). In this step, we first 
find the meters that measure only the vertices in V. Among 
them, we find K basic measurement sets of V \ i?, denoted 
by P'^ (k = 1, ..^K), using Gauss-Jordan elimination. Then, 
we construct K feasible spanning trees Tk ~ (V, £''), one for 
each 'P'^, using the max-flow method given in the Appendix. 

Step 2: Vertex identification. For each tree Tk, we identify 
the child and descendant vertices of each vertex (included in 
line 5-6 in Algorithm 2). This can be achieved by constructing 
a directed tree from the root to all leaf vertices. If there is an 
arc {i,j), we say Vj is a child of Vi, denoted by Vj £ C (i). 
In general, if there exists a path from Vi to Vj , we refer to Vj 
as a descendent of Vi, denoted by Vj G D{i). In Fig. [3] for 
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Fig. 3. A measured feasible tree. {vi,v^,vii} ai'e the terminals and vi is 
the reference. Two mai'ked edges ([4, 6] and [9, 11]) are mapped to injection 
meters and the other unmarked edges are mapped to flow meters. 

instance, vq and vj are the child vertices of V4, while v^ to 
Vis are all descendent vertices of V4. 

Step 3: Tree pruning. For each Tk, we start from the root to 
the leaf vertices to prune away redundant vertices (line 5-6). 
For a vertex Vi, we find the largest prunable subset C*{i) C 
C{i), such that the residual tree is still a feasible measured 
tree after all the vertices in {Cs{i) U D{Cs{i))} are pruned. 
In particular, {Cs{i) U D{Cs{i))} can be pruned if: 

1) {Cs{i) U D{Cs{i))} contains no terminal vertex, 

2) the deletion of {Csii) U D{Cs{i))} will remove all the 
edges mapped to injections that measure any vertex in 
{Csii)UDiC,{t))}. 

Then, we update Tk by removing all the vertices in {C*(i) U 
D{C*{i))} and proceed to another vertex until each vertex in 
V is either checked or pruned. 

Step 4: Vertex update. Let \Tk\ be the number of remaining 
vertices in Tk. Then, we select among the K trees the one 
with minimum vertices, denoted by T*. If |r*| = |V|, i.e. 
no vertex is removed for all the K trees, we terminate the 
algorithm and output V* as the remaining meters in T* (line 
7-9). Otherwise, we update V as the remaining vertices in T* 
and start another round of pruning from Step 1) . 

In Fig. [3l we present an example to illustrate TPH. Starting 
from the root vi, among the three child vertices of vi, only V2 
can be pruned, since the descendent vertices of either v^ or v^ 
contain terminal vertex. After pruning V2, we proceed to check 
1)3, whose only child vertex is a terminal. Then, we check v^, 
where neither of its child vertices vq and v^ can be pruned 
separately or together. This is because vq contains terminal as 
its descendent vertices, and the removal of W7 does not remove 
the edge [4, 6], which is mapped to the injection meter at vq 
that measures vj. For v-j, however, all of its descendent vertices 
can be pruned following the vertices pruning conditions. Up to 
now, we have finished the first round of pruning. Then, we use 
the remaining vertices {ui, W3, W4, W5, ug, W7, ws} to generate 
new feasible trees, if any, and repeat the pruning operations 
iteratively until no vertex can be further pruned. 

The purpose of introducing the parameter K is because 
the final output V* is closely related to the tree's topology 
obtained in Step 1. Intuitively, with larger K, we have a 
larger chance to obtain a smaller \V*\ but also consume more 
computations. The proper choice of K will be discussed in 
Simulations. The correctness of TPH is obvious from the 
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14-bus 


57-bus 


118-bus 


No. of lines 


20 


80 


186 


Total no. of measurements 


19 


80 


180 


No. of inject measurements 


8 


30 


70 


No. of flow measurements 


11 


50 


110 


No. of unmeasured lines 


2 


2 


7 



following facts: 1) the K residual trees are always feasible 
measured tree; 2) the size of the minimum residual tree is 
non-increasing during the iterations; 3) \'P*\ equals the size 
of the minimum residual tree. There are at most \I\ — \V\ 
rounds of pruning. In each round, K trees are pruned and 
each takes O (|I|"^) time complexity, dominated by the Gauss- 
Jordan elimination computation. The overall time complexity 
is O (yK\I\'^'), which is considered efficient even for very large 
scale power systems. 

V. Simulation Results 

In this section, we use simulations to evaluate the proposed 
defending mechanisms. All the computations are solved in 
MATLAB on a computer with an Intel Core2 Duo 3.00- 
GHz CPU and 4 GB of memory. In particular, MatlabBGL 
package is used to solve some of the graphical problems |2T|, 
such as maximum-flow calculation, etc. Besides, Gurobi is 
used to solve MILP problems ll22]| . The power systems we 
considered are IEEE 14-bus, 57-bus and 118-bus testcases, 
whose topologies are obtained from MATPOWER ||231 and 
summarized in Table I. All the systems are observable with the 
respective measurement placement. For illustration purpose, a 
measurements placement of the 14-bus system is plotted in 
Fig. [H The measurement placements for 57-bus and 118-bus 
systems are omitted for the simplicity of expositions. 

We first evaluate the computational complexity of TPH in 
Fig. m where MILP is the benchmark for comparison. For 
TPH, we set the parameter K = 1 and record the total 
number of vertices that are checked to produce a solution. 
For MILP, we record the number of nodes explored in the 
search tree by the branch-and-bound algorithm. Both numbers 
are the iterations consumed by the two methods to obtain 
a solution. Besides, we also record the CPU time for both 
methods. The results in Fig. [3] are the average performance 
of 50 independent experiments. Without loss of generality, we 
randomly generate a T) with size \V\ = 4 in each experiment. 
In Fig. 11^, we show the average number of iterations for 14- 
bus, 57-bus and 118-bus systems, respectively. We find that the 
iteration numbers are very close for both methods in the 14- 
bus system, where TPH consumes 38 iterations and the MILP 
consumes 47 iterations to obtain a solution. However, the 
difference becomes more and more significant as the network 
size increases. The number of iterations of TPH increases by 
11 times as the network size increases from 14 to 118 buses. In 
vivid contrast, the iteration number of MILP increases rapidly 
by 2272 times, from merely 47 to 106787. Similar results are 
also observed for the CPU time, where TPH takes only 0.485 
second to obtain a solution in 118-bus system, while MILP 
consumes around 5 minutes, which is 1410 times slower than 
in the 14-bus system. The booming computational complexity 
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Fig. 4. Compaiison of computational complexity for MILP and TPH. (a) 
The figure above shows the average number of iterations to obtain a solution; 
(b) the figure below shows the average CPU time to obtain a solution. 

of the MILP method is due to the NP-hamess of solving a 
MILP. It is foreseeable that the computational complexity of 
the MILP method will become extremely expensive as we 
further increase the network size. For instance, the projected 
CPU time of MILP to solve a problem in 300-bus system is 
more than 5 days, while it takes TPH less than 2 seconds. 

We also investigate the impact of the parameter K to the 
performance of TPH. By varying the values of K and |2?|, we 
show in Table III the average solution size I'P] of TPH and 
MILP. Each entry of the table is the average performance of 
50 independent experiments. From the 2nd to the 6th rows, we 
see that better solution, i.e. smaller \V\, is obtained with larger 
K. Compared with the optimal solution V* obtained by MILP, 
TPH protects on average only 1.13 more meters when K = 15. 
The optimality gap is less than 10% for all the cases. For better 
visualization, we plot the ratio I'Pl/l'P*! for some selected 
|P|'s in Fig.jSh. We notice that the ratio improves notably for 
small \T>\ as K increases from 1 to 15. For instance, the ratio 
improves from 1.32 to 1.04 for \T>\ = 1. The improvement 
is especially notable when we change K — I to 3. However, 
the improvement becomes marginal as we further increase K, 
such as the case with I'D] — 49, where the ratio only improves 
by 0.03 from A' = 1 to 15. We also plot in Fig. M> the CPU 
time normalized against the time consumed when K ~ \. We 
observe that the CPU time increases almost linearly with K, 
which matches our analysis in Section IV. Results in Fig. [5] 
indicate that we should select a proper K to achieve a balance 
between the quality of approximate solution and computational 
complexity. In particular, a large K, such as ivT = 10, should 
be used when I PI is small relative to n, while small K, such 
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Fig. 5. Effect of K to the performance of TPH in the 57-bus system, (a) 
The figure above shows the solution size of TPH normalized by the optimal 
solution size obtained by MILP; (b) the figure below shows the CPU time of 
TPH normalized by the CPU time when K = 1. 

as K = 3, should be used when |I?| is relatively large. 



VI. Conclusions 

In this paper, we used graphical methods to study defending 
mechanisms that protect a set of state estimates from false- 
data injection attacks. By characterizing the optimal protection 
problem into a variant Steiner tree problem, we proposed both 
exact and approximate algorithms to select the minimum num- 
ber measurements for system protection. The advantageous 
performance of the proposed defending mechanisms has been 
evaluated in IEEE standard power system testcases. 

Appendix 
Maximum-flow method for tree construction 

We use an example in Fig. [1] to illustrate the method 
to obtain a feasible spanning tree. We consider a basic 
measurement set P — {ri, rg, ri2,ri4} of V \ i?, where 
V = {vi,V2,V4,V5,vq} and R = vi. The set of edges 
measured by "P is 5 = {61,62,65,67,610}- Then, a directed 
graph is constructed in Fig. [6l where vi is chosen as the 
root to construct the spanning tree. We select in advance an 
edge connected to the root, say 61, in the final tree solution. 
This is achieved by setting both the lower and upper capacity 
bounds of the edge to be 1. The other edges' lower and 
upper capacity bounds are set to be and 1, respectively. 
Then, a maximum flow is calculated from s to t. If the 
problem is feasible, i.e. the flow solution is 1 in edge 61, 
we obtain a measurement-to-edge mapping by observing the 
saturating flows in the graph. Otherwise, we select another 
edge connected to the root and recalculate the maximum flow 
problem. Since V is observable from P, there is always a 
solution. In the above example, the final measurement-to-edge 
mapping is {ri, r6,ri2, r^} o {61,610,62,67}. Then, die 
edges obtained by the maximum flow calculation will form 
a tree that spans all vertices in V. 
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